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ORDER SELECTION IN NONLINEAR TIME SERIES MODELS 
WITH APPLICATION TO THE STUDY OF CELL MEMORY^ 

By Ying Hung 

Rutgers University 

Cell adhesion experiments are biomechanical experiments study- 
ing the binding of a cell to another cell at the level of single molecules. 
Such a study plays an important role in tumor metastasis in can- 
cer study. Motivated by analyzing a repeated cell adhesion experi- 
ment, a new class of nonlinear time series models with an order se- 
lection procedure is developed in this paper. Due to the nonlinearity, 
there are two types of overfitting. Therefore, a double penalized ap- 
proach is introduced for order selection. To implement this approach, 
a global optimization algorithm using mixed integer programming is 
discussed. The procedure is shown to be asymptotically consistent 
in estimating both the order and parameters of the proposed model. 
Simulations show that the new order selection approach outperforms 
standard methods. The finite-sample performance of the estimator is 
also examined via a simulation study. The application of the proposed 
methodology to a T-cell experiment provides a better understanding 
of the kinetics and mechanics of cell adhesion, including quantify- 
ing the memory effect on a repeated unbinding force experiment and 
identifying the order of the memory. 

1. Introduction. Cell adhesion plays an important role in many physio- 
logical and pathological processes, especially in tumor metastasis in cancer 
study. Cell adhesion experiments refer to biomechanical experiments that 
study the binding of cells at the molecular level. The binding is mediated by 
specific interaction between cell adhesion proteins, called receptors, and the 
molecules that they bind to, called ligands. The resulting bond is called the 
receptor-ligand bond. There are various types of measurements in the cell 
adhesion experiments to study different aspects of the binding, such as the 
binding frequency and bond lifetime measurements [Zarnitsyna et al. (2007), 
Huang et al. (2010)]. This research is inspired by analyzing a specific type 
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Fig. 1. Illustration of the biomembrane force probe. 



of cell adhesion experiment known as the unbinding force assay [Marshall 
et al. (2003, 2005)]. 

Receptor-ligand bonds that mediate cell adhesion are often subjected to 
forces that regulate their dissociation; therefore, an important issue is to 
study the unbinding force of a receptor-ligand bond. To address this issue, 
the unbinding force assay is developed by using a high-tech version of the 
micropipette known as the biomembrane force probe [Chen et al. (2008)]. 
A biomembrane force probe is illustrated in Figure 1 where a probe bead 
(left) is attached to the apex of the micropipette-aspirated red blood cell to 
allow tracking of the deflection of another cell (right). Figure 2 illustrates 
one cycle of the unbinding force assay. It includes an approaching stage 
where the probe bead and the T-cell are brought into contact. In the next 
stage, the touch of the two subjects is controlled with a given contact time so 
that a receptor-ligand bond might occur. In the last stage, the probe bead 
and the T-cell are retracted at a constant rate until they go back to the 
unbinding position that indicates the bond failure. The y-axis in Figure 2 
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Fig. 2. One cycle of the unbinding force experiment. 
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represents the applied force in the foregoing process. The unbinding force is 
measured by the force difference observed at the point of bond failure. 

Two interesting questions are raised in analyzing the repeated unbinding 
force tests where the unbinding force assay (i.e., approaching, contact and 
retraction) is performed repeatedly for each pair of experimental units, in- 
cluding a T-cell and a probe bead attached to a red blood cell. Such repeated 
assays are conducted for different pairs of units as replicates. The objective 
of the experiments is to study the dependence of the repeated unbinding 
force measurements because it was discovered recently that cells appear to 
have the ability to "remember" the previous adhesion events. Zarnitsyna 
et al. (2007), Hung et al. (2008) and Huang et al. (2010) demonstrated that 
in some biological systems the occurrence of binding in the immediate past 
assay could either increase or decrease the likelihood for the next assay to 
result in a binding. Such memory effects can affect not only through the 
binding frequency but also the unbinding force. Hence, the first question is 
how to model the memory effect on the repeated unbinding force assays. 
Apart from this, different receptor-ligand bonds can have a different order 
of the memory due to their string strength difference. Specifying the order 
of the memory for receptor-ligand bonds is important because it can be used 
to classify the bonds into groups for further biological study. Therefore, the 
other question is how to identify the order of the memory. 

To answer the foregoing questions, a naive approach is to study the mem- 
ory on the unbinding force by a time series model. However, the standard 
time series models cannot be applied directly. The reason is as follows. Due 
to the inherent stochastic nature of single molecular interaction, any partic- 
ular assay has two random outcomes, either a receptor-ligand bond occurs 
or not. An unbinding force is representative and the resulting memory ef- 
fects are considered only if the corresponding assay is associated with the 
occurrence of a bond. Theoretically, a distribution function might be used 
to capture the chance of a bond formation with respect to unbinding force. 
However, the related studies are mainly developed based on the independent 
assumption on the repeated adhesion experiments [Marshall et al. (2005)]. 
Being the first attempt to study the memory, we assume that the occurrence 
of a bond is determined by having the unbinding force above some threshold, 
which can be interpreted as the average unbinding force for bond dissocia- 
tion. That is, if a bond occurs during the contact, the unbinding force would 
be larger than some threshold. The threshold, however, is unknown and has 
to be estimated from the data because of the detection limits and measure- 
ment errors. For example. Figure 3 is an example of the experiments with 
20 repeated unbinding force assays generated from Hung et al. (2008). For 
each cycle of the assay, unbinding forces can be easily measured as described 
in Figure 2. A threshold has to be determined so that time series models 
can be applied to those forces that are above the threshold. Failing to in- 
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Fig. 3. The measurements from the repeated unbinding force assays. 

elude sueh a threshold term can lead to a systematic bias in the successive 
adhesion assays. Because of the unknown threshold, conventional time se- 
ries modeling techniques cannot be used. Furthermore, to identify the order 
of the memory, a new order selection approach that takes into account the 
foregoing features is called for. 

A new time series model is proposed in this article to study the memory 
effect on the repeated unbinding force assays. It is a multiple nonlinear time 
series model with an unknown threshold parameter. Even though there are 
numerous studies on nonlinear time series modeling [Tong and Lim (1980), 
Tsay (1989), Fan and Yao (2003)], most of them are developed based on 
a single series of observations and focus on the situation where nonlinearity 
is determined by a particular variable. For example, the threshold autore- 
gressive model [Tong (1983, 2007)] is constructed for a single series of obser- 
vations with a delay parameter indicating the variable where the threshold is 
applied. The proposed nonlinear model is different from the existing nonlin- 
ear time series models in that there is no specific delay parameter involved. 
Instead, the threshold is applied to all the historical observations. More- 
over, there is a hierarchical structure imposed upon the nonlinear model 
that makes the model more interpretable. Besides, this model handles mul- 
tiple time series by incorporating random effects to take into account the 
heterogeneity among experimental units. 

Identifying the order of the memory is equivalent to specifying the correct 
order of the proposed time series model. This is different from standard or- 
der selection problems because there are two types of overfitting associated 
with the proposed nonlinear time series model. Thus, a double penalized ap- 
proach is developed and a global optimization algorithm using mixed integer 
programming (MIP) is introduced to implement this approach. The order se- 
lection consistency and asymptotic properties for the proposed method are 
discussed. The discontinuity of the conditional mean function of the new 
model results in nonstandard asymptotics for the estimators. 




vS 



200 T-cell/ptobe 400 

bead contact 



NONLINEAR TIME SERIES MODELING 



5 



Although the methodology is motivated by the analysis of biomechan- 
ical experiments, it can be applied to a wide variety of studies, such as 
longitudinal data analysis [Diggle et al. (2002)], econometrics and influenza 
modeling. For example, in influenza modeling [Hyman and LaForce (2003)] , 
the proposed method can be applied to model the spread of a disease, such 
as SARS. Because an epidemic threshold is used to indicate the take off and 
die out of an epidemic, the spread of the disease is of interest only when 
the threshold is reached, such as the infected population exceeding some 
amount. These thresholds are often unknown and estimated from the data. 
Therefore, the proposed model can be desirable for these studies. 

The remainder of the paper is organized as follows. In Section 2 the non- 
linear time series model is introduced. The estimation and order selection 
procedures with a global optimization algorithm are introduced. In Section 3 
the order selection consistency and some asymptotic properties of this model 
are discussed. The performance of the new model and the order selection 
procedure is demonstrated via simulations in Section 4. The proposed model 
is applied to an unbinding force assay in Section 5. Summary and concluding 
remarks are given in Section 6. 



2. New class of nonlinear time series models. 



2.1. Modeling. A new multiple nonlinear time series model is introduced 
in this section. Assume yu represents the unbinding force observed from 
the ith subject at time t, where i = 1, . . . ,n, t = 1, . . . ,m and the sample 
size N = mn. Define r as a threshold parameter. Having the unbinding 
force above r indicates that the corresponding contact results in a receptor- 
ligand bond and no bond otherwise. A random effect a = (ai,...,a„) is 
incorporated to take into account a variety of situations with the multiple 
time series, including subject heterogeneity, unobserved covariates and other 
forms of overdispersion. The random effects Qj's are assumed to be mutually 
independent and normal distributed with mean and variance in this 
paper. The following model is proposed to quantify the memory effect on 
the unbinding forces that are associated with receptor-ligand bonds: 

■ai + f3o + eit, if2/i,t_i<r, 
■ai + f3o + PiVi^t-i + £it, if yi,t-i > T, yi^t-2 < t, 



(1) 



Vit ■ 

yu - 
yu - 



+ P2yi,t^2 + eu, 



if Vi^t-i > T, > T, < T, 



yu = a-i + l3o + (iiyi,t-i 

H h /3kyi,t-k + £u, if yi,t-i > T, . . . , yi,t-k > t, 

where /3j's are the fixed effects and the error terms en are independent with 
distribution N{0,a'^). 
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The first equation in (1) corresponds to the situation where no receptor- 
hgand bond occurs in the previous test (i.e., t/i^t-i I^t)- It amounts to mod- 
eling the unbinding forces in a sequence of independent adhesion tests. Let 
the mean unbinding force be Pq. The estimated value for /3o is the aver- 
age unbinding force in independent adhesion assays and can change with 
different settings of the experimental variables, such as different contact 
durations. Extensions can be easily achieved by incorporating these exper- 
imental variables into the model. The second equation in (1) describes the 
unbinding force when a receptor-ligand bond occurs in the previous test (i.e., 
Ui^t-i > t) but no bond in (i-e., yi,t-2 ^t)- In this situation, a mem- 

ory could be carried over from the previous observations. Thus, a first-order 
autoregressive model is considered. This autoregressive modeling continues 
to the previous k assays. Similar interpretation can be given to the rest of 
the model. The value k represents the upper bound of the memory order; 
detailed discussions on identifying the order of the memory are given in 
Section 2.2. 

The above model can be written in a concise form as follows: 



Vit = z-Q + /3o + I3iyi,t-il[yi,t^i >t]+ /32yi,t-2l[yi,t-i > r, y^-2 > t] 



= g{(3,T,a'^ I Hit) + eit, 

where I{yi^t-i > t) is an indicator function which takes value one iiyi^t-i > t 
and zero otherwise. The fixed effects are denoted by /3 = (/3o, /3i, . . . , the 
information from previous observations are included in Ha = {l,yi^t-i, ■ ■ ■ , 
yi^t-k), and Zj = . . . , -Zj,^}' is the design matrix for the random effects cx 
such that z'^a = Oj. Since the proposed model is not limited to the analysis 
of unbinding force assay, the random intercept alone may not be sufficient to 
capture the variation exhibited in other applications. Hence, we use a general 
random effect structure hereafter. We call this new nonlinear time series 
model the multiple threshold autoregressive (MUTARE) model. 

The MUTARE model is very general and includes an interesting special 
case with a single series of observations. Assuming that the time series ob- 
servations are yt, t = 1, . . . ,m, the special case of the MUTARE model can 
be written as 



This is different from the conventional nonlinear time series models. The 
closest model in the literature is the threshold autoregressive models intro- 
duced by Tong (1983, 1990). There are various extensions of the threshold 
autoregressive models [Samia, Chan and Stenseth (2007)] and the nonlin- 
earity therein is determined by a particular variable with which the thresh- 
old parameter is defined. The MUTARE model, however, has the threshold 



(2) 



H h I3kyi,t-kl[yi,t~i > T, . . . , yi^t-k > r] + eu 



(3) 



yt=l3o + /3iyt^il[yt^i > r] H 

+ I3kyt-kl[yi,t^i > T, . . . , yt~k > r] + . 
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applied to all the historical observations. Furthermore, different from the 
threshold autoregressive model where piecewise linear submodels are fitted 
separately, a hierarchical structure is imposed upon the submodels in MU- 
TARE as illustrated in (1), which makes the model easier to interpret. 

2.2. Estimation and order selection procedure. A crucial step in this 
study is to specify the order of the memory, denoted by ko . This is an order 
selection problem but different from standard ones in that there are two 
types of overfitting. By maximizing the log likelihood function, the resulting 
model may overfit the data with some small values of nonzero /3j's (type I 
overfitting) and/or with a large estimated order (type II overfitting). This is 
not surprising given the same problem experienced in estimating parameters 
in finite mixture models [Chen and Khalili (2008)]. Therefore, we propose 
to penalize type I overfitting by a function Pxid/^fcl) and penalize type II 
overfitting by the estimated order (maxjjj : f3j 7^ 0}). The reason to consider 
type II overfitting is because the MUTARE model has a hierarchical struc- 
ture as shown in (1). Once the order of the model (i.e., max^jj : /3j 7^ 0}) is 
determined, all the previous equations have to be considered. So a double 
penalized likelihood is defined as 



(4) pl(/3, a^T) = 2 log L(/3, a^r) -Y,Px^m) - X2 max{j : ^ 0}, 



where L is the likelihood function. By maximizing (4), the solutions, (3 and 
maxjjj : /3j 7^ 0}, are the estimated parameters and order of the memory. 

To prevent the first type of overfitting, there are different penalty func- 
tions discussed in the literature [Donoho and Johnstone (1994), Tibshirani 
(1996, 1997), Fan and Li (2001)]. Here we focus on the adaptive Lasso [Zou 
(2006)] where (|/3j |) = Ai^'j|/3j| and i/i,...,i'fc are known weights. The 
specification of I'j can be fairly flexible and more discussions can be found 
in Zou (2006). We consider a weight vector suggested in Zou (2006) with 
z>j = where p> and /3j is a root-re-consistent estimator of /3j. In 

Hung (2011), it is shown that the MLE of /3 is root-re-consistent under 
model (2), therefore it can be applied. 

By the following proposition, we can have a closer look at how the double 
penalized approach works. The proof is straightforward and is omitted. 

Proposition 1. The penalized likelihood function in (4) is equivalent to 



k 



k k 



pl(/3, a\T) = 2 log L(/3, r) - PxAW) - ^i^J + 



(5) 
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Equation (5) connects the penalty for type II overfitting with the Lq 
penalty, which directly controls the number of nonzero coefficients in the 
model. Therefore, the double penalized approach is closely related to a com- 
bination of Lq and Li penalties, which is carefully studied by Liu and Wu 
(2007) and found to deliver better variable selection than the Li penalty 
while yielding a more stable model than the Lq penalty. 

2.3. Mixed integer programming. In this section a global optimization 
algorithm is introduced using the idea of MIP. MIP is an active research 
area in operations research with many applications. The objective here is to 
solve /3j's by maximizing the double penalized likelihood function (4). It is 
achieved by the following proposition. 



Proposition 2. The penalized likelihood function in (4) is equivalent to 

k 



pl(/3, a^T)=2 log L(/3, a^, r) - ^ Px, 



i=i 

-A2 5^(l-/(/3, = --- = A = 0)). 

As discussed in Proposition 2, this problem is equivalent to the maximiza- 
tion of (6). Substitute variable /3j by two nonnegative variables /3j~ and /?~ 
with /3,- = Bf — B~ . Then, we have 1/3 J = Bf + B~ and the maximization 
problem in (6) can be converted into a MIP problem with maximization of 

k k 

2 log L{(3+ -(3-,a\T)-Y^ Px, + Pj) - X2J2 

subject to 

/3+ + /3f + /32+ + /32" + • • • + + < Mzi, 
fi+ + p- + ... + P+ + p-<Mz2, 



/3+,/3->0, i = l,...,fc, 
Zj G {0,1}, 

where M is a very large constant and we can choose it to be the smallest 
upper bound of |/3j | if the prior knowledge is available. In the simulations, 
we apply the setting M = 50 and it works reasonably well in practice. In 
general, M can be even larger (e.g., M = 1000) for those problems with 
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large k. Note that since (3^ + Pj are to be minimized, (3^ and Pj would not 
be both positive in the optimal solution. 

To solve the foregoing MIP problem, there are numerous methods such 
as the most popular branch-and-bound algorithm. More details about algo- 
rithms and the related issues can be found in Nemhauser and Wolsey (1999). 
The examples we considered in this article are solved by the C language 
with a GLPK package (available at http : //www . gnu . org/ sof tware/glpk). 
Some other commercial optimization software such as CPLEX is also avail- 
able to solve such a problem. The complexity of MIP can be considerably 
affected by introducing too many integer variables (i.e., Zj^s), but it is in gen- 
eral not a critical concern. This is because the number of integer variables 
incorporated increases with the order A:, and it is usually in a manageable 
size in this application. For other applications with a large value of A;, one 
can obtain a reasonably good solution (not necessarily optimal) by setting 
a restriction on the computing time to achieve efficiency. 

Next we discuss the choice of the tuning parameters, Ai, A2 and p. There 
are different approaches available in the literature for selecting tuning pa- 
rameters [Stone (1974), Craven and Wahba (1979), Fan and Gijbels (1996)]. 
Burman, Chow and Nolan (1994) introduced the /i-block cross-validation for 
dependent data. The idea is to modify the leave-one-out cross-validation and 
reduce the training set by removing the h observations preceding and follow- 
ing the observation in each test set. Such blocking allows near independence 
between the training and test set. This approach is further improved by 
Racine (2000) to achieve asymptotic consistency. That is, instead of leave- 
one-out, the size of the validation set is increased to riy. So the training 
set has size Uc and + Uc + = m — k. In this paper, we implement 
Racine's approach with the setting h = [m — k)/A and ric being the integer 
part of mP'^ , which appears to work well in a wide range of situations in 
practice [Racine (2000)]. 

The rest of the parameters can be estimated by the standard maximum 
likelihood approach. Denote the observation by vector Y = (yi, . . . , y„)', 
where the observations for subject i are denoted by = (yji, . . . , yim)' ■ Given 
the historical information Ha and the random effects, the associated likeli- 
hood as a function of the fixed effects /3 and the threshold parameter can 
be written as 

n m 

L{(3,t\ cx) = Y{Y{l{yit I a, Hit), 
1=1 t=i 

where /(•) is the likelihood for each observation yu given a and the cor- 
responding historical information. Considering the normality of the error e 
and random effects a, the joint log likelihood can be easily derived as 

21ogL(/3,a2,r) 

(7) 

= -log|W| - {Y-g{f3,T,a'' \ H)yW-\Y - g{P,T,a' \ H)), 
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where g{(3, r, cr^ | H) is the mean vector, H = {H[, . . . , H!^)' , Hi = {H[^, . . . , 
H''^)' , Z is the design matrix for the random effects with rows z'-, and 
W = (7^1 + a'^ZZ' . Note that cr^ is assumed to be known for notational con- 
venience. The variance component o"^ is estimated by maximizing the orig- 
inal hkelihood throughout the paper and the estimator can be further im- 
proved by the restricted maximum hkehhood [McCuhoch and Searle (2008)]. 
Such a version of the variance components developed for the linear mixed 
model can be easily extended to the multiple threshold autoregressive model 
so that the estimated variance component is invariant to the values of the 
fixed effects and the degrees of freedom for the fixed effects can be taken 
into account implicitly. 

3. Large sample properties. The consistency of the order selection pro- 
cedure and the asymptotic properties of the resulting estimators in the 
MUTARE model are studied in this section. The parameter space of 7 = 
(/3,T, cr^) is denoted by Q, and the true parameter is denoted by 79 = 
(/3o, To, ctq). Assumptions and proofs are deferred to the Appendix. 

Lemma 1 shows that the maximum penalized likelihood estimator for the 
MUTARE model is stochastically bounded. 

Lemma 1. Under Assumptions AI-A4, there exists a v > such that, 
for m and m sufficiently large, the maximum penalized likelihood estimator of 
the parameter 7 = (/3, r, cr^) lies in a compact space Oi = {7 G : I7 — 7o| < v} 
almost surely. 

The convergence rate of the estimated threshold parameter is derived in 
Theorem 1 for the MUTARE model. This result is analogous to Chan (1993) 
for the least squares estimator of the threshold autoregressive model. Not 
surprisingly, the estimated threshold parameter in the MATARE model has 
a fast convergence rate [0(1/A^)] which is similar to that in the threshold 
autoregressive model, and the fast convergence rate is also due to the dis- 
continuity of the conditional mean function [Chan (1993), Hansen (2000)]. 
Note that, as a special case, the estimated threshold parameter in (3) obtains 
a convergence rate 0(l/m). 

Theorem 1. Under Assumptions AI-A4, the maximum likelihood esti- 
mator of the threshold has the property that f = tq + Op{l/N), based on the 
MUTARE model. 

Define H = {H[, . . . , H'J, Hi = {H[^, . . . , H[J, and 

Hit = (1, yi,t-il{yi,t-i > r), . . . , yi,t-kl{yi,t-i > t, . . . , yi^t^k > t)). 
Let Pq = (/3|j^^, /3'^2))'' where P'^-^^-^ is a vector with all the nonzero parameters 
and the rest of the parameters are denoted by P'(^2)- Furthermore, assume 
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^ ^ — >• A, where A is positive definite and can be written as 



A: 



All Ai2 
A21 A22 



according to (3'^^-^-. and /3'^2) • 

In the next theorem, we show that the penahzed hkehhood estimator 
of /3 enjoys the oracle properties [Fan and Li (2001)], which indicates the 
consistency in variable selection and the asymptotic normality. This result 
also implies the order selection consistency of the proposed order selection 
procedure. 

Theorem 2. Suppose that Xi/^/N and \iN^p~^'>^'^ — )■ 00. Under 
Assumptions AI-A4, for any r], < r] < 00, the maximum penalized like- 
lihood estimator of (3 in the MUTARE model satisfies the following two 
properties as n —t- 00 and m — )• 00 ; 

(i) /3(2) = with probability 1, 

(ii) sup|,_^^,|<^/^^|^,_^.|<^/^ViV(^(i) -/3(i)) ^'^iV(0,Ar/). 

Apart from the fixed effects, asymptotic distributions of the estimated 
variance components deserve more investigation. Numerous works have ap- 
peared in the literature addressing methods of variance component estima- 
tion in linear models and the associated asymptotic properties [Jiang (1996), 
McCulloch and Searle (2008)]. Strong consistency of the estimated variance 
component in nonlinear mixed effect models [Nie (2006)] is expected to be 
extended to the MUTARE model. A rigorous theoretical proof along the lines 
of Nie (2006) is not attempted here, and remains the subject of ongoing the- 
oretical work. However, it is briefly noted that the asymptotic conditions, 
such as Assumptions A3 and A4, required for the results here are indeed 
met by the requirement in Nie (2006). The requirement of n — )■ 00 for the 
main theorems is based upon the asymptotic study in Nie (2006) and it is 
expected to be further relaxed by the techniques developed in Jiang (1996). 

4. Finite-sample performance and empirical application. In this section 
simulations are conducted to examine the finite-sample performance of the 
proposed models. Two examples are considered. The first example demon- 
strates the performance of the estimators in the MUTARE model and the 
second example compares the double penalized order selection procedure 
with a standard approach. 

4.1. Example 1. Consider the following MUTARE model with k = 2: 

Vit = ai + /3o + liiyi^t-il[yi,t-i > r] + P2yi,t~il[yi,t^i > yi,t-i >t]+ £it. 
The coefficients of this model are fixed at 7q = (/3q, 0.1, 0.5), where the fixed 
effects are /3q = (0, 0.5, 0.4). The random error eu is generated from a normal 
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Table 1 

Summary of smulation results in example 1 





T 


/3o 


/3i 


/32 








m — 30, 


n = 10 






Mean 


0.114 


0.038 


0.491 


0.390 


0.385 


sd 


0.029 


0.174 


0.083 


0.080 


0.188 


CP 




0.906 


0.859 


0.866 








m = 40, 


n = 15 






Mean 


0.111 


0.035 


0.484 


0.393 


0.431 


sd 


0.028 


0.155 


0.058 


0.047 


0.140 


CP 




0.915 


0.868 


0.889 








m — 60, 


n = 25 






Mean 


0.105 


0.036 


0.501 


0.398 


0.477 


sd 


0.019 


0.123 


0.041 


0.032 


0.090 


CP 




0.918 


0.878 


0.898 




True 


0.1 





0.5 


0.4 


0.5 



distribution with mean and variance 0.5. The sample size combinations 
used are (m = 30, n = 10), (m = 40, n = 15), and {m = 60, n = 25). For each 
combination, the simulations are conducted based on 1000 replicates. In this 
example, tuning parameters are determined by minimizing the mean squared 
prediction error of new generated testing data with the same size and then 
fixed for all the replicates. 

The simulation results are reported in Table 1. For each sample size com- 
bination, the sample means and standard deviations of the estimates are 
listed. The empirical coverage probabilities of the fixed effects, denoted by 
"CP," are listed in the last row of each setting. They are calculated based 
on the 90% confidence intervals of the corresponding regression parameters. 
As shown in the table, the sample mean of the estimates becomes closer to 
the true value and the associated standard deviation becomes smaller as the 
sample size increases. These results confirm the asymptotic consistency dis- 
cussed in Section 3. Moreover, when the sample size increases, the empirical 
coverage probabilities for the fixed effects are closer to the nominal coverage 
probabilities. 

To assess the asymptotic normality, normal Q-Q plots are reported in 
Figure 4. It is plotted based on the three estimated fixed effects, /?2 
and /33, with the sample size combination m = 60 and n = 25. In general, 
the data points being close to straight lines in the Q~Q plots confirms that 
the estimates are normally distributed. 
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Fig. 4. Normal Q~Q plots m example 1. 

4.2. Example 2. In this example we study the performance of the pro- 
posed order selection procedure. Since there is no existing approach avail- 
able, we compare the double penalized approach with a naive Akaike infor- 
mation criterion [AIC; Akaike (1973)], which is suggested for order selection 
in the threshold autoregressive models [Tong (1980)], and the Bayesian in- 
formation criterion [BIC; Schwarz (1978)]. Three different models following 
equation (3) are considered with parameters given in Table 2 and sample 
size 200. The threshold parameters are assumed to be 0.01 and the random 
errors are generated from a normal distribution with mean and variance 
0.1. The tuning parameters are determined as in example 1. 

Table 3 shows the order selection performance of AIC, BIC and the dou- 
ble penalized approach. The column fco indicates the true order. For both 
methods, we report the percentage of times that the estimated order equals 
a number of values (i.e., 1 to 5) out of 1000 replicates. The numbers with 
boldface indicate the most selected orders. For model 1, all the three meth- 
ods select the right order with their highest frequency. The double penal- 
ized approach and BIC perform equally well in this model and both of 



Table 2 
Parameter values in example 2 



Model 


/3i 


/32 


/33 


/34 


/35 


1 


0.4 


0.4 











2 


0.5 


0.3 


0.1 








3 


0.3 


0.2 


0.1 


0.05 
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Table 3 
Simulation results in example 2 











AIC 






Model 




1 


2 


3 


4 


5 


1 


2 


0.178 


0.580 


0.193 


0.014 


0.020 


2 


3 


0.142 


0.574 


0.150 


0.101 


0.031 


3 


4 


0.522 


0.325 


0.111 


0.042 


0.000 










BIC 






Model 


fco 


1 


2 


3 


4 


5 


1 


2 


0.001 


0.749 


0.152 


0.088 


0.001 


2 


3 


0.243 


0.536 


0.151 


0.058 


0.012 


3 


4 


0.553 


0.322 


0.110 


0.015 


0.000 










Double penalized 






Model 


fco 


1 


2 


3 


4 


5 


1 


2 


0.103 


0.751 


0.091 


0.050 


0.004 


2 


3 


0.000 


0.053 


0.659 


0.167 


0.121 


3 


4 


0.023 


0.081 


0.248 


0.645 


0.003 



them perform better than AIC. For example, the double penalized approach 
has a 30% [= (0.751 — 0.580)/0.580] higher chance to select the right order 
compared with AIC. For models 2 and 3, both AIC and BIC tend to un- 
derestimate the order and the double penalized approach selects the correct 
order with probability higher than 65%. These results indicate that the dou- 
ble penalized approach outperforms the other two methods in terms of order 
selection. The computational efficiency of the double penalized approach is 
reasonably close to AIC and BIC in the simulation. The average computing 
times are 4.38 seconds for AIC, 4.45 seconds for BIC and 4.92 seconds for 
the double penalized approach. 

5. Application in unbinding force experiments. In this section we revisit 
the repeated unbinding force experiments and apply the proposed method 
to study the memory effect on such repeated assays. There are 15 pairs 
of experimental subjects and each pair includes a T-cell and a probe bead 
attached to a red blood cell as described in Figure 1. For each cell adhesion 
cycle, a T-cell and a probe bead are brought into contact (i.e., touch) for 4 
seconds and then retracted to the unbinding position (see Figure 2). Such 
a cycle is performed repeatedly on the same pair of experimental subjects 
for 50 times. Figure 5 is three randomly selected samples of the repeated 
unbinding forces from such experiments. For each sample, the forces are 
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Fig. 5. The measurements from the repeated unbinding force assays. 

plotted based on observations in 1000 seconds with 50 repeated adhesion 
cycles completed. 

The unbinding forces are collected according to the definition in Figure 2. 
Prior knowledge [Zarnitsyna et al. (2007), Hung et al. (2008)] indicates that 
a reasonable order of the memory in this process should be less than 5. 
Therefore, we first fit the MUTARE model with k = 5 and then the double 
penalized order selection procedure is applied. The order of the memory is 
identified as two and the memory effect on the repeated unbinding force 
experiments can be quantified by the MUTARE model as 

m = ai + OMbyi^t-iIiVi^t-i > t] + 0.1lyi^t^2l[yi,t~i >T,yi^t~2 > t], 

where i = 1, . . . , 15, t = 1, . . . , 50, the random effect ai follows normal distri- 
bution with mean —0.072 and variance 0.389. The estimated order of the 
memory in this experiment is consistent with that in Hung et al. (2008) with 
a similar setting but different measurements. Such consistency provides im- 



16 



Y. HUNG 



portant evidence of a unified underlying kinetic mechanism in the adhesion 
process. The estimated threshold, f = 0.089, indicates that an adhesion leads 
to a bond only if the unbinding force is larger than 0.089pA^. Based on this 
result, the occurrence of a bond, although unobservable, can be easily stud- 
ied by measuring the corresponding unbinding forces. Since random effects 
are considered, the fitted model can be used to make inference beyond the 
15 pairs of experimental subjects. 

6. Summary and concluding remarks. Despite numerous results avail- 
able in modeling nonlinear time series, their applications are limited. For 
example, they are mainly constructed for a single series of observations and 
focus on the case where the nonlinearity is determined based on one variable. 
Furthermore, there is no order selection procedure available with theoreti- 
cal justification for such models. Motivated by the analysis of the repeated 
unbinding force experiments, a new nonlinear time series model, MUTARE, 
and a double penalized order selection procedure are introduced. 

The proposed model handles multiple time series by incorporating ran- 
dom effects to borrow strength across different subjects. Thus, inference and 
predictions can be made beyond the experimental units in the study. More- 
over, the proposed methodology provides a new nonlinear time series model 
that is easy to interpret and captures the autoregressive behavior of the ob- 
servations above some unknown threshold. The double penalized procedure 
can be used to efficiently identify the order and can be easily implemented 
by a global optimization algorithm using mixed integer programming. The 
selection consistency and asymptotic normality of the estimators are de- 
rived. Apart from the asymptotic results, the finite-sample performance is 
examined via simulations. 

As an application, the MUTARE model is illustrated by modeling the 
memory effect on the repeated unbinding force assays. The fitted model 
provides a better understanding of how force regulates receptor-ligand in- 
teractions. This work is one of the first few studies considering memory 
effects in the cell adhesion experiments. More studies are needed to con- 
struct a rigorous and interpretable biological model. An ongoing project 
includes theoretical development for the estimated threshold, relaxation of 
the constant threshold assumption, and taking into account important pro- 
cess variables, such as contact duration, into the model. 

APPENDIX A: ASSUMPTIONS 

Assumption A1. The process yu is stationary, ergodic and has finite 
second moments. 

Assumption A2. The autoregressive function is discontinuous, that is, 
there exists a H* = (1, y^_-^^, yl_^?) such that H*{As - At) / and yt-i = 
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••• = yt-j = T, where Ai = (/3o, 0, . . . , 0)', . . . , = (/3i, . . . , /3fc)', {s,t) G 
(1,...,A;- 1), and j = l,...,A;. 

Assumption A3. There exists a Mi > such that £'[tr(Wj~^ZiZ- x 
Wr^ZiZl)]^ < Ml, and E[tr(W-^Z,Zi) - {y^-g{Hi, /3o, tq, a^)yy^T^ ZiZ[ x 
W~^(yi — /3o) To, f^)]^ ^ -^1 for all i, where Wj and are the matrices 
of the covariance and random effects for the ith subject. 

Assumption A4. liminf„_^oo A„ = A > 0, where is the smallest eigen- 
value of -lY.^EMWT^Z,Z'^T^Z,Z'^]. 

Assumptions Al and A2 are necessary for the strong consistency of the 
fixed effect and threshold parameter estimators. Assumptions A3 and A4 are 
used for the strong consistency of the variance components. More discussions 
can be found in Nie (2006). 

APPENDIX B: PROOF OF LEMMA 1 
The proof relies on verifying the following two claims. 

Claim 1. There exists a M2 > such that, for m and n sufficiently large, 
the maximum likelihood estimator of 7 lies in = {7 G : |/3i — f3ifi\ < 
M2, . . . , |/3fc — /Sfc.ol < M2, |cr^ — ctqI < M2} almost surely. 

Verification of Claim 1. Recall 79 = (/3o,r, do) and define (Sq = 
(/3o,0) • • • if^kflY ■ To prove Claim 1, it suffices to show that for m and n suffi- 
ciently large and uniformly for 7 not belonging to O2, we have (mn)~^ (pKt) ~ 
pl(7o)) < almost surely: 

pl(7) - pl(7o) ^ pl(/3, r, a^) - pl(/3o, tq, a^) 

, , mn mn 

^ pl(/3o,ro,o-^) -pl(/3o,To,crg) 
mn 

We first examine the first part on the right-hand side of (8). Assuming that 
the variance component is consistent along the lines of Nie (2006), the study 
of the first part can be transformed into the study of Y* = W~^' ^(y — Za), 
which is used in the derivation for both pl(/3, r,cr^) and pl{(3Q,TQ,a'^). We 
have 

pl(^, r, ct2) - pl(/3o, To,a^) = 2 log L(/3, r, a^) - 2 log L(/3o, tq, a^) 

k 

+T.iPx^m,o\) - PxA\m 

+A2 max{j : Pj^q / 0} - max{j : / 0} . 
L i j J 
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First, up to an additive constant, we have 

2 log(/3, r, a^) = -Y^Y.{y*^- g(p, r, \ Hu)f . 
i t 

Due to the nonhnearity, the derivation for a general MUTARE model can 
be lengthy in nature. Therefore, we illustrate the detailed derivation by 
a smaller model and consider the case where r > tq. The same argument 
can be easily applied and extended to the MUTARE model and the case 
r < To in general. 

Consider a MUTARE model with k = 2\ 

' yit = ai + f3o + eit, if2/i,t_i<T, 
(9) < yu = ai + f3o + PiVi^t-i + en, if 2/i,t-i > t, 2/i,t_2 < t, 

_ Vit = ai + (3q + PiVi^t-i + hyi,t^2 + eu, if yi,t-i > T, 2/i,t_2 > t, 

the corresponding log likelihood function can be decomposed by 

21ogL(/3,T,a2) 

i t 

- Yl JZ'^yit ~ /^0)^^[t-0 < yi,t-l < T, yi^t-2 < M 

i t 

- X] 'Y'yyit - /3o)^^[to < yi,t-i < r, yi,i_2 > To] 
i t 

-JZJZ'^yii - /3o - l^iylt-if I[yi,t-i > T,yi^t^2 < To] 

i t 

-^^hJa - Po- l^iylt-if I[yi,t-i > T,ro < yi^t-2 < r] 
i t 

- Y Y{y*it - Po- Piylt-i - p2ylt-2fi[yi,t-i > t, yi,t-2 > r] 

i t 

= Rii(3,T,a^) + --- + ReiP,T,a^). 

Defining Ai = (/3o,o, 0, 0)', A2 = (/3o,o, A.o, 0)', ^3 = (/3o,o, /3i,o, /32,o)', = 
(/3o,0,Oy, 52 = (/3o,/3i,0)', and S3 = (/3o, A, /32)', we have 

R^{p,T,a^)-Ri{Po,To,a^) 

= E E[-(^** - /5o - Piylt^i? + {yu - /3o,o - Pi,oylt-i?] 



(10) 



X I{yi^t^i > T,yi^t^2 < To) 
Y.Y.^-{y*t - Hl^.B^f + {yl - Hl^.A 



t 



X I{yi,t-i > T,yi^t-2 < To) 
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it ' ^ ' 

X I{yi,t^i > T,yi^t-2 < To) 



Therefore, based on the uniform law of large numbers [Pollard (1984), page 8], 
we have 

— (pl(Ar,a2)-pl(/3o,ro,a2)) 



mn 



< 2{\Bi - ^il + 1^1 - A2I + \Bi - Asl 

+ \B2-A2\ + \B2 -A-sl + lBs- AsDe 
- {\Bi -Ai\^ + \Bi - A2\^ + \Bi - As\^ 

+ \B2 -A2\^ + \B2 - As\^ + 1^3 - A3\^){K - e) 

+ {mn)-'iY.^P,M,o\)-PxA\m 
{j=i 



+ A 

2eAi -A2(i^-e) + A, 



max{ j : / 0} - max{ j : pj / 0} 



where 



2 



P i<j \\ \Bi-Aj\ J ' 

and lij is the corresponding indicator function as listed in (10). Note that 
the uniform law of large numbers in Pollard [(1984), page 8] assumes that the 
data are independent and identically distributed. This assumption is relaxed 
to a stationary ergodic process by Samia and Chan (2011). Therefore, the 
uniform law of large numbers can be applied here. Based on the Cauchy- 
Schwarz inequality, we have Ai < \/6A2 < 6A2 for sufficiently large M2. For 
sufficiently large m and n, A3 < eA2. Thus, by selecting e < K/IA, it follows 
that {mn)-^{l{P,T,a^) - l{f3o,To,a^)) < 0. 

For the second term on the right-hand side of (8), under Assumptions A3 
and A4, the maximum likelihood estimator of the variance component al- 
most surely converges based on the results in Nie (2006). Therefore, we have 
(mn)~-'^(Z(/3Q, To, o"^) — ?(/3o, tq, do)) <0 and Claim 1 follows. 

Claim 2. There exists a M3 > such that, for m and n sufficiently large, 
the maximum likelihood estimator of •y lies in 0,^ = {•y £Q2-\t — to\ < M3} 
almost surely. 
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Verification of Claim 2. Similar to Claim 1, it suffices to show that, 
for m and n sufficiently large, (mn)^^ (pl(')') — p1(7q)) < for 7 not belonging 
to f^a- We apply the same decomposition as in Lemma 1 and focus on the 
first part on the right-hand side of (8). Applying the uniform law of large 
numbers and the same transformation as described in Claim 1, for m and n 
sufficiently large, it holds that 

pl(/3,r,a^) -pl(/3o,To,(j^) 
mn 

= m7i^^|21ogL(/3,r, cj^) 

k 

-2\ogL{(i^,To,a'')Y,[PxA\P3fi\)-PxA\m 



+ A2 



max{ j : fij^ / 0} - max{j : (5j / 0} 

■ 3 3 



< E{{-{yl - H*t^,Bif + {yu - H,,t^^A^f)I[y,^t^^ < tq]} 
+ E{{-{yl - Hl_^B,f + [yl - H^.^A^f) 
X I[tq < yi^t^i < T,yi^t^2 < To]} 
+ E{{-iy*, - Hl_^B,f + [yl - H^.^A^f) 
X I[tq < yi^t^i < T,yi^t^2 > To]} 
+ E{{-iy*, - Hl_^B2f + [yl - H^.^A^f) 
X I[yi^t~i > T,yi,t_2 < tq]} 
+ E{{-{y*, - Hl^^B^f + {yl - H^.^A^f) 
X I[yi^t-i >T,To< yi,t-2 < t]} 
+ E{{-{y*, - Hl,_,Bsf + {yl - Hl^^.A^f) 

X I[yi^t~i > T,yi^t-2 > t]} + e. 
Considering the situation where t > tq, we have 

mn ~ ' 

where 

J = E{{-{y*, - Hl,^,B,f + (y* - Hl,_,A,f)I[y,,t^i < tq]} 
+ E{{-{y*, - Hl,_^B,f + {yl - HL.A^f) 
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X I[tq < yi^t^i < T,yi^t^2 < To]} 

+ E{i-{y*, - Hl^^B.f + [yl - H^^^A^f) 

X I[tq < yi^t^i < T,yi^t^2 > To]}. 

When r = oo, the model becomes a hnear mixed model; therefore, by the 
dominated convergence theorem and a similar argument in Samia and Chan 
(2011), it holds almost surely that, for m and n sufficiently large and for 
any M3 > 0, {mn)-^{l{(3,T,a'^) - Z(/3o, tq, cj^)) < for t > tq + M3. Similar 
derivation can be applied to the case r < ro, thus the detail is omitted. 

Following the same argument for Claim 1, the second part on the right- 
hand side of (8) is smaller than with Assumptions A3 and A4. Therefore, 
Lemma 1 holds. 

APPENDIX C: PROOF OF THEOREM 1 

Without loss of generality, the parameter space can be restricted to = 
{7 G : 1/3 — /3q| < 5, |(T^ — o-qI < 5, |r — tqI < 5} according to Lemma 1. To 
simplify the notation, we assume that tq = 0. Because the derivation for 
a general model is lengthy, we consider the same model in Lemma 1, the 
MUTARE model with fc = 2 in (9), and assuming r > 0, we have 

pl(/3,r,a2)-pl(/3,0,a2) 

= 21ogL(/3,r,a2)-21ogL(/3,0,cT2) 

i t 

+ M, - Hl^^B.f - [yl - Hl,^,Bsf]Q2 
+ Mt - Hlt~iB2f - {yl - Hl,_,Bsf]Q3} 

i t 

-{Hl_M2-B2)f]Qi 

+ [1HI_,{B3 - Bi)eu + {Hl,_,iAs - B^)f 

-{Hl,_,{As-Bs)f]Q2 

+ [2H:,_,iB3 - B2)eu + {Hl^AM - B2)f 

-(Hl,^^iA,-B,)f]Q-s}, 

where Qi = /(O < yi^t~i < T,yi^t^2 < 0), Q2 = I{0 < yi^t-i < T,yi^t~2 > 0), 
Q3 = I{t < yi^t-1,0 < yi,t-2 < t). If (5 is sufficiently small, based on Assump- 
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tion A2, we have Ei[(^M-i(^ " Bj)f - {H*t_^{As - Bs))']Qk > 0, for 
k = 1,2,3 and s > j. Therefore, by the same argument in Proposition 1 of 
Chan (1993), it holds that for all e > 0, there exists a T such that with proba- 
bility greater than 1-e, 7 G fi^, r > T/N, implies /(/3, r, a^) -/(/3, 0, ci^) < 0. 
Similar derivation can be extended to the case where r < —T/N . Hence, 
Theorem 1 holds. 



APPENDIX D: PROOF OF THEOREM 2 

We first prove the asymptotic normality. Based on the adaptive lasso 
penalty, 

u = arg min npl (u) , 



where np/ (u) = -2 log L(/3 + u, , r ) + Ai ( | /3j + | ) + A2 maxj {j : /3j + 

Uj 7^ 0}. By the Taylor expansion, we have 

np«(u) = np/(0) -u'^'W^^(c7)(y-5(F,/3,r,cj2)) 



1 r- ' 



N 



+ Ai5]i^,(|/3,+n,|-|/3,|) 
i=i 

+ A2 ( max{j : /3j + Uj 7^ 0} — max{j : fij / 0} 

The last term on the right-hand side equals if Uj = and f3j =0, combining 
with the fact that [Zou (2006)] 



(11) Xiu,{\pj+uj\-\pj\)^v 



{ 0, if ^ 0, 

0, if fij = and Uj = 0, 
00, if Pj = and Uj / 0, 



we have for every u 

npl{u) — npl{0) 

{ -u[,^H{iyW~\a) {Y - g{H, (3, r, a^)) 
' ^U(i))'An(^/]Vu(i)) 



+ 



00, 



if M(2) = 0, 
otherwise. 



By the same argument of Theorem 2 in Zou (2006), the asymptotic normality 
holds by the martingale central limit theorem [Hall and Heyde (1980)]. 
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For consistency, it suffices to show that -P(/3(2) 7^ 0) — t- 0. Using the Karush- 
Kuhn-Tucker (KKT) optimahty conditions, it follows that 

2H{iyW-\a){Y - g{H,^,T,a^)) = Aiz.(i), 
where are the weights corresponding to the first q variables. Note that 

00 [Theorem 2, Zou (2006)] and 2 ^(1)'^-^ W0_-g(g./3.r,a^)) 
totically normal. Therefore, 

^(^2) / 0) < P(2i?(l)'W-^(a)(y - giH,P,T, a^)) = Aiz.(i)) ^ 0, 
and Theorem 2 holds. 
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